December 4, 2017

Scenario: A company approaches you to predict data scientist salaries with machine learning.

Let's predict data scientist salaries

What is Machine Learning

Machine learning is a method for teaching computers to make and improve predictions or behaviours based on data.

Step 1: Find some data

Step 2: Throw ML on your data

set.seed(42)
task = makeRegrTask(data = survey.dat, target = 'CompensationAmount')
lrn = makeLearner('regr.randomForest', importance=TRUE)
mod = train(lrn, task)

Step 3: Profit. We are done!

"There is a problem with the model!"

What problem?

"The older the applicants, the higher the predicted salary, regardless of skills."

Individual Conditional Expectation

ice = generatePartialDependenceData(mod, task, features ='Age', 
                                    individual = TRUE)
plotPartialDependence(ice) + scale_y_continuous(limits=c(0, NA))

ice.c = generatePartialDependenceData(mod, task, features ='Age', 
          individual = TRUE, center = list(Age=20))
plotPartialDependence(ice.c)

Partial dependence plots

pdp = generatePartialDependenceData(mod, task, features =c('Age'))
plotPartialDependence(pdp) + scale_y_continuous(limits=c(0, NA))

"We want to understand the model better!"

Permutation feature importance

feat.imp = getFeatureImportance(mod, type=1)$res
dat = gather(feat.imp, key='Feature', value='Importance') %>% arrange(Importance)
dat$Feature = factor(dat$Feature, levels = dat$Feature)
ggplot(dat)  + geom_point(aes(y=Feature, x = Importance))

Gender?!

pdp = generatePartialDependenceData(mod, task, features =c('Gender'))
ggplot(pdp$data) + geom_point(aes(x=Gender, y=CompensationAmount)) + 
  geom_segment(aes(x=Gender, xend=Gender, yend=CompensationAmount), y=0) + 
  scale_y_continuous(limits=c(0, NA)) + 
  theme(axis.text.x = element_text(angle = 10, hjust = 1))

LIME

dat = getTaskData(task)
explanation <- lime(dat, mod)
# Explain new instance
explainer <- lime::explain(dat[3, ], explanation, n_features = 3)
plot_features(explainer, ncol=1)

Interested in learning more?